A feature selection method based on the Golden Jackal-Grey Wolf Hybrid Optimization Algorithm

This paper proposes a feature selection method based on a hybrid optimization algorithm that combines the Golden Jackal Optimization (GJO) and Grey Wolf Optimizer (GWO). The primary objective of this method is to create an effective data dimensionality reduction technique for eliminating redundant, irrelevant, and noisy features within high-dimensional datasets. Drawing inspiration from the Chinese idiom “Chai Lang Hu Bao,” hybrid algorithm mechanisms, and cooperative behaviors observed in natural animal populations, we amalgamate the GWO algorithm, the Lagrange interpolation method, and the GJO algorithm to propose the multi-strategy fusion GJO-GWO algorithm. In Case 1, the GJO-GWO algorithm addressed eight complex benchmark functions. In Case 2, GJO-GWO was utilized to tackle ten feature selection problems. Experimental results consistently demonstrate that under identical experimental conditions, whether solving complex benchmark functions or addressing feature selection problems, GJO-GWO exhibits smaller means, lower standard deviations, higher classification accuracy, and reduced execution times. These findings affirm the superior optimization performance, classification accuracy, and stability of the GJO-GWO algorithm.


Introduction
With the continuous evolution and innovation in computer science technology and storage hardware and their widespread applications in fields such as finance, social media, and biomedicine, various forms of unstructured data have experienced exponential growth [1].However, these unstructured data types typically contain numerous redundant, irrelevant, and noisy features, making subsequent data mining and scientific research processes challenging [2].Therefore, the rational and practical identification of optimal feature subsets within these unstructured data collections is essential for subsequent data engineering research.
Feature selection (FS) is a method employed to reduce the dimensionality of data by eliminating a substantial number of redundant, irrelevant, and noisy features from the original dataset while endeavoring to retain all the essential attributes [3][4][5].Based on the variations in search and evaluation techniques, feature selection is traditionally classified into three fundamental categories: filter, wrapper, and embedded methods [6,7].Notably, a trade-off characterizes the relationship between filter and wrapper methods.While filter methods are computationally more efficient, wrapper methods excel in feature selection tasks by incorporating the classification model's feedback [8].Consequently, this paper's primary research focuses on the wrapper-type feature subset search process.
The crux of addressing the feature selection problem lies in searching for and evaluating feature subsets [9].This search for a feature subset can be seen as a combinatorial optimization problem, traditionally approached through exhaustive techniques or heuristic approaches.However, the perpetual accumulation of data has sparked the predicament known as the "curse of dimensionality" [10], rendering traditional methods that rely on exhaustive sampling of every data point or heuristic techniques impractical [11].In essence, selecting informative feature subsets from high-dimensional data presents notable challenges, necessitating the development of effective feature selection methods to efficiently reduce the original data into a lower-dimensional space [12,13].
Recently, metaheuristic algorithms have emerged as a preferred tool for tackling combinatorial optimization problems [8,[14][15][16].These algorithms are highly regarded for their straightforward heuristics, robust global search capabilities, and insensitivity to parameter settings, making them versatile solutions across various domains [17][18][19].When employed as search strategies for feature subset selection to address feature selection challenges, metaheuristic algorithms prove advantageous in circumventing the issues associated with traditional optimization methods.
The metaheuristic algorithms mentioned above have each made valuable contributions to feature selection at different points in time, capitalizing on their strengths.Nevertheless, as per the No-Free Lunch theorem [82], it is essential to acknowledge that no single algorithm can universally address all optimization problems.This recognition drives researchers towards a continuous quest for more advanced and versatile algorithms capable of addressing diverse challenges.
The Golden Jackal Optimization (GJO) algorithm [83] has emerged as a promising contender among these metaheuristic algorithms.GJO draws inspiration from the hunting behavior of golden jackals and is renowned for its minimal parameterization, swift search capabilities, and remarkable global exploration potential.It has found applications across a spectrum of complex problem domains.However, the foundational GJO algorithm grapples with certain limitations when confronted with intricate optimization challenges, particularly issues associated with local optima and diminished solution precision.Therefore, a significant impetus behind this study is enhancing the GJO algorithm to improve its optimization performance.Furthermore, another crucial motivation is to explore the application of this enhanced GJO algorithm in tackling feature selection problems.To be specific, the main contributions of this paper are outlined as follows: 1. Drawing inspiration from the Chinese idiom "Chai Lang Hu Bao" and the principles of hybrid algorithm mechanisms, we incorporated the leadership strategy of the head wolf and the hierarchical structure from GWO into the GJO algorithm.This integration serves to diversify the solutions during the algorithm's iterations.By enhancing solution diversity, we have increased the GJO algorithm's ability to escape local optima, thus reinforcing its global exploration capabilities.
2. Drawing inspiration from the collaborative mechanisms observed in natural populations, we introduced the Lagrange interpolation method to update the population's positions within the GJO algorithm.This addition aims to enhance the algorithm's convergence accuracy.The novel population updating mechanism strengthens the algorithm's local exploitation capabilities.
3. We amalgamated the GWO algorithm, Lagrange interpolation method, and GJO algorithm to introduce the multi-strategy fusion GJO-GWO algorithm.Subsequently, we successfully integrated this algorithm with the KNN classifier to address feature selection problems.
4. We applied the proposed GJO-GWO algorithm to eight benchmark functions and ten feature selection problems.Experimental results indicate that, under identical experimental conditions, the GJO-GWO algorithm exhibits superior optimization performance, classification performance, and stability.

Algorithm Technique Methodology Result
GA [29] Genetic Algorithm GA+KNN GA+KNN approach on a lung cancer database reveals 100% accuracy.
PSO [74] Particle Swarm Optimization SS-PSO SS-PSO is a highly competitive method for high-dimensional FS.
ACO [36] Ant Colony Optimization AMFSA AMFSA is effective in achieving an excellent feature subset with great classification efficiency by 15 multilabel datasets.
GWO [45] Grey Wolf Optimization ABGWO ABGWO has the most advantages in classification accuracy, feature subset size and calculation time.
ABC [37]  HHO [65] Harris Hawks Optimizer LIL-HHO LIL-HHO shows superior performances on most cases relative to the basic HHO and other compared meta-heuristic algorithms.
GOA [70] Grasshopper Optimization Algorithm LAGOA LAGOA has been used in standard disease datasets of the UCI machine learning repository and the results prove its superiority over other state-of-the-art methods on these datasets.
SMA [81]  The organizational structure of this study is outlined as follows: In Section 2, we introduce the standard GJO algorithm.In Section 3, we present the multi-strategy fusion GJO-GWO algorithm.Section 4 explores the search and optimization performance of the GJO-GWO algorithm when dealing with complex benchmark functions.Section 5 investigates the feature selection method's convergence and classification performance based on GJO-GWO.Finally, Section 6 summarizes the current research and discusses potential future research directions.

The GJO algorithm
The Golden Jackal Optimization (GJO) algorithm, developed by Chopra et al., draws inspiration from the biological population habits and predatory behavior of golden jackals.It is a novel metaheuristic algorithm that employs mathematical modeling techniques to simulate the hunting behavior of golden jackal populations, encompassing prey search, tracking, surrounding, and attacking processes.In the GJO algorithm, each individual within the population represents an initial feasible solution.The algorithm iteratively updates this population, simulating the golden jackal population's search, tracking, surrounding, and attacking behavior until the pack successfully captures its prey, constituting the algorithm's stopping condition.When this condition is met, it indicates no significant change between the previous and subsequent generations of the population, signifying the discovery of the optimal solution or optimal solution set.The GJO algorithm comprises four main processes.

(1) Population initialization-Algorithm initialization
Like other metaheuristic algorithms, the initial population of the GJO algorithm is randomly distributed across the search space.It can be defined as: Where Y 0 represents the initial population of golden jackals.Y max and Y min correspond to the upper and lower boundaries of the search space, respectively.rand denotes a random number within the range of [0, 1].
In the GJO, the initial matrix of prey is defined as follows: Where Prey represents the prey matrix, y i,j represents the value of the j th dimension for the i th prey, n denotes the number of prey, and d represents the dimensionality of the problem being solved.
During the algorithm's iterative process, the fitness value of each prey is calculated using an appropriate fitness function.Therefore, the fitness values of all the prey can be expressed as follows: Where F OA is the fitness value matrix of all preys; f is the fitness function. (

2) Searching and tracking the prey-iterative search process
Golden jackals exhibit inherent autonomous prey perception and tracking capabilities in the natural world.When a member of the population senses the presence of prey, the male jackal assumes the role of the leader, guiding the female jackal in the pursuit of the prey.This process can be represented through mathematical modeling as follows: Where t represents the current iteration number.Prey(t) is the prey position at the t th iteration.Y M (t) and Y FM (t) represent the positions of the male jackal and the female at the t th iteration, respectively.Y 1 (t) and Y 2 (t) represent the updated positions of the male jackal and the female.E is the energy function of the prey avoiding the golden jackal is defined as: Where E 1 represents the energy decline process of the prey.E 0 is the initial energy state of the prey.r is a random number between [0,1].c 1 is a constant 1.5.T represents the maximum number of iterations of the algorithm.
In Eq (4) and Eq (5), rl represents a random number generated from the Levy distribution, and it can be calculated using the following formula: Where LF is the Levy flight function, defined as: Where μ and v are random numbers between [0,1].β = 1.5. (

3) Surrounding and attacking the prey-iterative approximation process
As time elapses, the prey's diminishing escape energy leads to a gradual encirclement and attack by the population of golden jackals.This process can be mathematically represented as follows:

4) Capturing the prey-algorithm termination
The population of golden jackals cooperatively surrounds and attacks the prey, eventually resulting in the successful capture of the prey.This process can be delineated as follows: Where Y(t+1) is the position of the golden jackal at the (t+1) th iteration.When Y 1 (t) and Y 2 (t) do not change significantly, that is, when Y(t+1) and Y(t) do not change significantly, the golden jackal successfully captures the prey, the algorithm iteration terminates, and the algorithm finds the optimal solution.

GJO-GWO hybrid optimization algorithm based on multistrategy fusion
In the GJO algorithm, the individual search mechanism of the jackal serves as an efficient strategy for achieving rapid convergence.On the other hand, the collective behavior of the jackal ensures the algorithm's capability to approach the global optimum.Therefore, striking a balance between these two strategies is paramount to facilitate the algorithm's swift convergence toward the global optimal solution.Nevertheless, adhering to the No-Free-Lunch theorem [82], no solitary approach can comprehensively address all problems.To enhance the convergence and optimization performance of the GJO algorithm, this paper introduces the wolf search strategy from the GWO algorithm and integrates the Lagrange interpolation method.

A variation of GJO and GWO
3.1.1The GWO algorithm.The GWO is a classical metaheuristic algorithm that simulates the hunting behavior of a pack of grey wolves consisting of an α wolf, β wolf, δ wolf, and ω wolf.These wolves collaborate to search, track, and surround their prey.The algorithm is based on the mathematical model that emulates the hunting process of a grey wolf pack, aiming to optimize the objective function in the solution space iteratively.The primary model of the GWO algorithm is as follows: (1) Predation and hunting model of gray wolves.In the GWO, wolves of different levels cooperate with each other and jointly search for prey.When the prey is found, α wolf leads the wolves of other levels to track, surround and attack the prey until the prey is captured.The process of the mathematical model is as follows: D P ¼ jC P X P ðtÞ À XðtÞj ð14Þ Where t is the current iteration number.p = 1,2,3 represent α wolf, β wolf, and δ wolf.X P (t) and X(t) represent the positions of the prey and gray wolves at the t th iteration.D p represents the distance between α wolf, β wolf, and δ wolf and the prey in the t th iteration.(D P !0 means that the gray wolves chase and attack and gradually surround and capture the prey).A P and C P are important parameters to control the hunting step of gray wolves.
(2) The location update model of gray wolves.In the process of hunting in a grey wolf pack, the position update equation for the wolves is defined as follows: In this equation, X(t+1) represents the updated positions of the grey wolves after the t th iteration, which corresponds to the initial positions of the grey wolves at the (t+1) th iteration.The algorithm iterates until there is no significant change in the positions of the grey wolves between two consecutive iterations.This indicates that the grey wolves have successfully captured their prey, and the algorithm stops iterating.

Introducing the α-wolf in the GJO algorithm.
The integration of the Grey Wolf Optimization (GWO) algorithm into the GJO algorithm is motivated by the Chinese idiom: "Chai Lang Hu Bao."As social animals such as jackals and wolves engage in cooperative hunting, their collaboration enables them to search and encircle prey across a more comprehensive spatial area.This collaborative effort compensates for individual differences, leading to an enhanced hunting success rate and an improved survival rate for the population.Even though both wolf and jackal packs collaborate in hunting, they exhibit distinctive hunting strategies.Jackal packs usually lack a strict hierarchical structure and rely on individual jackals forming groups to hunt collectively.In contrast, wolf packs are typically led by a dominant alpha wolf, adhering to a rigorous leadership hierarchy for coordinating group hunting activities.
To bolster the optimization performance of the GJO algorithm, this paper introduces the leadership strategy of the alpha wolf and the hierarchical structure concept from the GWO algorithm, disregarding competitive interactions between different populations.The alpha wolf is seamlessly integrated into the GJO algorithm as a secondary-tier individual, distinct from the golden jackals.The alpha Wolf's primary role is to aid the golden jackals in their search for and encirclement of prey.This introduction of leadership strategy and hierarchical structure from the GWO algorithm is a deliberate effort to enhance the GJO algorithm's optimization capability.
Introducing the alpha wolf makes notable adjustments to the critical iterative process of the Improved Golden Jackal Optimization (IGJO) algorithm.These specific modifications are detailed below.
(1) Searching and tracking the prey-iterative search process.
(2) Surrounding and attacking the prey-iterative approximation process.
Introducing the alpha wolf to the GJO algorithm instigates a transformation from the initial solitary hunting mechanism, reliant solely on the cooperation between golden jackals, to a collaborative hunting approach led by the golden jackal pair and the alpha wolf.As illustrated by Eqs ( 17) through (22), the inclusion of the alpha wolf extends the search radius of the initial population during the early stages (Eqs 17-19), consequently elevating the chances of detecting the prey.Additionally, it heightens the effectiveness of surrounding and capturing the prey (Eqs 20-22).The alpha wolf, guided by the golden jackal pair, significantly accelerates and enhances the encircling of the prey, reducing the probability of the prey escaping the population's pursuit range.This, in turn, results in improved hunting speed and precision for the population.
However, with the introduction of the alpha wolf, the IGJO algorithm incorporates new parameters A and C, leading to an increase in algorithm complexity.To mitigate the potential impact of these new parameters, we substitute parameters A and C with parameters E and rl from the original GJO algorithm.As a result, Eq (19) and Eq (22) are modified as follows:

Collaborative updating mechanism of GJO-WOA based on Lagrange interpolation
Incorporating the hierarchical structure from the GWO algorithm and introducing the alpha wolf in the IGWO algorithm necessitate a modification of the original position update equation (Eq (13)) used in the basic GJO algorithm.In the fundamental GJO algorithm, when there is no alteration between the population positions of the previous generation and the current generation, signifying convergence, it implies that both the male and female jackals should occupy the same position.Combining Eq (13), it can be observed that Y 1 (t), Y 2 (t), and Y 3 (t) should be equal in this case.However, with the introduction of the hierarchical structure of the gray wolf pack and the α wolf, using Eq (13) as the position update equation does not consider the α wolf's influence.This directly affects the precise application of the proposed improvement mechanism in Section 3.1 for enhancing the optimization performance of the GJO algorithm.
Considering the integration of the gray wolf optimization algorithm, the existing population will now consist of three fundamental elements: male jackals, female jackals, and α wolf.The positions of these three essential elements can be simplified as three points (Y 1 (t), Y 2 (t), and Y 3 (t)).To ensure the convergence of the position update equation, it is necessary to guarantee the intersection of the constructed three-point iteration formula.Additionally, considering the cooperation and convergence among the male jackals, female jackals, and α wolf, there should be interactions among the three points.Based on this, we introduce the Lagrange three-point interpolation formula, resulting in the new population update equation: Where Y 1 (t), Y 2 (t), and Y 3 (t) represent the positions of the male jackal, the female jackal, and the α wolf in the t th iteration, respectively.Y(t) represents the population position after the last iteration, that is, the initial position of the current population iteration.Y(t+1) represents the population position after the iterative update.The function of constant 3 is mainly used to control the convergence of the iterative equation and ensure that the distribution weights of male jackal, female jackal, and α wolf are consistent.
Incorporating the enhancements detailed in Section 3.1 and Section 3.2, the improved iteration process of the Golden Jackal Optimization algorithm, referred to as the Multi-Strategy Integrated Golden Jackal-Grey Wolf Hybrid Optimization Algorithm (GJO-GWO), is illustrated in the basic flowchart presented in Fig 1.

GJO-GWO algorithm execution
Combining the algorithm improvement mechanism and Fig 1 .The pseudocode for the proposed GJO-GWO algorithm is detailed in Algorithm 1. if f�f M 04: Calculating the random number rl associated with the levy function using Eqs ( 9) and ( 10) 11: for (Iterating through each individual in the population) 12: Computing the energy function E for prey avoiding the jackal wolves based on Eqs ( 6), (7) Updating the population positions according to Eq (25) 23: end for 24: Boundary handling 25: t = t+1 26: end while Output: Y 1 (t) and f M 3.4.2Space complexity.For the GJO-GWO algorithm, initializing the golden jackal and grey wolf populations occupies the most significant space.Therefore, the spatial complexity of the GJO-GWO algorithm can be characterized as O(2×N×d).

Test case 1: GJO-GWO for benchmark functions
In this section, we will comprehensively describe the experimental results of the GJO-GWO algorithm concerning benchmark functions.Through these experimental results, we will discuss the algorithm's capabilities in finding optimal values and its convergence performance.Finally, we will employ two statistical tests, Wilcoxon and Friedman, to validate the statistical significance of the GJO-GWO algorithm's superiority.
(2) Datasets.To validate the optimization performance of the GJO-GWO algorithm in solving complex functions, this study conducted numerical simulation experiments on eight benchmark functions.Detailed information on the datasets can be found in Table 2.
(3) Parameter settings.This paper employed specific parameter settings for each algorithm to ensure fair and objective comparisons in the experimental setup.The parameter configurations for each algorithm are presented in Table 3.

Evaluation metrics
1. Overall Optimization Results: The GJO-GWO algorithm consistently demonstrates smaller mean values and standard deviations than the other algorithms across most benchmark functions.This signifies that the enhanced algorithm exhibits superior convergence performance and optimization capabilities compared to its counterparts.
2. Optimization Results for Unimodal Functions (F1-F4): GJO-GWO consistently achieves smaller mean values and standard deviations than the nine other algorithms across the four single-peak functions.This is evidence of the algorithm's exceptional optimization performance and stability in locating global optima.These findings underscore GJO-GWO's enhanced local exploration capabilities on unimodal functions.
3. Optimization Results for Multimodal Functions (F5-F8): Except for a slightly lower performance on F6, GJO-GWO consistently outperforms the other algorithms when optimizing multi-peak functions.Consequently, concerning the overall optimization results for multimodal functions, GJO-GWO exhibits superior global exploration capabilities compared to its counterparts.
The experimental results unequivocally establish that the GJO-GWO algorithm achieves convergence in solving complex functions.It consistently outperforms the competing algorithms regarding mean values and standard deviations, thus highlighting its robust optimization and exploration capabilities.These outcomes validate the efficacy of the GJO-GWO algorithm in tackling function optimization tasks.
To visually compare and analyze the superior performance of the GJO-GWO algorithm in solving complex functions in comparison to the other nine metaheuristic algorithms, we have plotted basic graphs of the eight benchmark functions and convergence curves for each algorithm after 1000 iterations, as depicted in Fig 2 .As observed in Fig 2, it becomes evident that the GJO-GWO algorithm exhibits the fastest and earliest convergence for both unimodal and multimodal functions.This observation strongly suggests that the GJO-GWO algorithm possesses a higher convergence rate and superior convergence accuracy, thus affirming its exceptional optimization performance in addressing complex functions.In summary, under uniform experimental conditions, the GJO-GWO algorithm, which combines multiple strategies from the Golden Jackal Optimization and Grey Wolf Optimization, surpasses the other nine metaheuristic algorithms in terms of both mean and standard deviation indicators.This substantiates GJO-GWO's superior local exploitation capability and enhanced global exploration capability in solving unimodal and multimodal functions.Furthermore, the minimal standard deviation indicates GJO-GWO's heightened robustness when optimizing complex functions.

Statistical test analysis.
To comprehensively and objectively assess the optimization performance of the GJO-GWO algorithm, this study employed two statistical tests for evaluation.Firstly, to assess the significant differences between the GJO-GWO algorithm and the other algorithms, pairwise comparisons were conducted using the GJO-GWO algorithm as the control.The Wilcoxon [90] rank-sum test was performed at a significance level of 5%, and the corresponding p-values are presented in Table 5.In Table 5, the symbols '+' and '-' indicate whether the algorithm has a significant statistical significance advantage ('+') or not ('-').The results in Table 5 reveal that the p-values obtained from the Wilcoxon rank-sum test for the GJO-GWO algorithm against all the other algorithms are significantly smaller than 0.05.This signifies that the GJO-GWO algorithm demonstrates a noteworthy advantage over the nine compared algorithms regarding optimization performance.
Secondly, while the Wilcoxon rank-sum test primarily focuses on comparing the performance between two algorithms, it is necessary to effectively evaluate the performance of each algorithm within the entire set.As a non-parametric test, we employed the Friedman test [91] to determine whether there were significant differences among multiple algorithm distributions.This test utilizes ranks to assess the overall optimization performance of the GJO-GWO algorithm across the eight benchmark functions and identify significant differences among various observed data.The results of the Friedman test for the GJO-GWO algorithm are presented in Table 6.As shown in Table 6, the GJO-GWO algorithm achieves the highest rank among the ten algorithms, thus confirming its significant advantage over the nine compared metaheuristic algorithms.
In conclusion, under uniform constraint conditions, the GJO-GWO algorithm exhibits superior overall metrics (lower mean and standard deviation) and statistical metrics (lower Wilcoxon rank-sum test and Friedman test results) compared to the nine algorithms.These findings underscore the GJO-GWO algorithm's enhanced local exploitation and global exploration capabilities.

Discussion
In Case 1, we conducted both convergence analysis and statistical tests to evaluate the performance of the GJO-GWO algorithm when solving benchmark functions of varying modes.By studying the experimental results of ten metaheuristic algorithms across different function modes, we comprehensively understood how the GJO-GWO algorithm performs in optimizing problems.
The GJO-GWO algorithm performs better in finding optimal values, stability, convergence, and statistical significance.These remarkable achievements can be attributed to the introducing of the alpha wolf and the cooperative strategies among the alpha wolf, male jackal, and female jackal.Primarily, during the initial iterations of the algorithm, the introduction of the alpha wolf expands the algorithm's search space, increasing the likelihood of the population discovering prey and thereby enhancing the algorithm's chances of escaping local optima.Furthermore, in the later iterations of the algorithm, the cooperation between the alpha wolf, male jackal, and female jackal accelerates the population's updating process, facilitating the algorithm in converging to the global optimum more rapidly.
However, it should be noted that as a hybrid optimization algorithm, the GJO-GWO algorithm exhibits a significant increase in both time and space complexity (as detailed in Section 3.4).This implies that while the proposed algorithm enhances performance, it also demands higher computational resources.Nevertheless, we consider this performance improvement worthwhile because, within our acceptable limits, we are willing to make certain computational sacrifices to achieve superior optimization performance.Hence, we regard the GJO-GWO algorithm as a meaningful algorithm enhancement.

Test case 2: GJO-GWO for feature selection
This section will provide a detailed description of applying the GJO-GWO algorithm to feature selection problems.Firstly, we will outline the specific implementation process of the GJO-GWO algorithm in the context of feature selection.Secondly, we will assess the performance of the GJO-GWO algorithm in feature selection problems based on experimental results.Finally, we will employ statistical analysis methods to confirm the exceptional performance of the GJO-GWO algorithm in feature selection tasks.

Binary Conversion. During the initialization phase, the GJO-GWO algorithm randomly
generates an initial population of N candidate solutions, where each individual represents a feature subset to be evaluated.However, feature selection problems are typically binary discrete problems.Therefore, when using the GJO-GWO algorithm to select feature subsets for evaluation, it is necessary to map the feature vectors from continuous to binary discrete space.This transformation is defined as: Where x binary represents the feature value after binarization.x ij indicates the real value and i = j = 1,2,� � �,N, j = 1,2,� � �,D.

Fitness function calculation.
Once the feature subsets are selected, it is necessary to calculate the fitness function for these feature subsets to determine their quality.The equation for computing the fitness function is defined as: Where γ R (D) represents the KNN classification error rate.|R| represents the length of the selected feature subset.|C| represents the total number of features in the datasets.α2(0,1) represents the importance of classification quality, and β = (1−α) represents the importance of the subset length [2].

Updating solutions.
Solution updating is a crucial component of the optimization algorithm for feature selection problems, and different algorithms employ various strategies.In this critical step, the GJO-GWO algorithm continually adjusts each selected solution using Eqs ( 17) through (24) to pursue improved solutions.Then, through Eq (27), the fitness evaluation of the new generation of feature subsets is performed to determine the best feature combinations.Typically, this process requires multiple iterations until the termination criteria are met.In this research, the termination criteria usually refer to reaching the maximum number of iterations, which helps evaluate the performance level of the GJO-GWO algorithm.

Classification.
As a typical wrapper feature selection method, the feature selection approach based on GJO-GWO not only employs GJO-GWO to search for feature subsets but also requires combining a learning algorithm to simultaneously evaluate these subsets, ensuring that while reducing the number of features, a high classification accuracy is maintained.In this study, we utilized a KNN classifier (k = 5) as the learning algorithm to evaluate the feature subsets selected by the GJO-GWO algorithm.We adopted the hold-out method to classify the original dataset, randomly splitting it into two portions: 80% as the training set and 20% as the test set.The KNN classifier (k = 5) assessed the classification accuracy.

Experimental evaluation
In this section, we present the experimental results and discuss the performance of the proposed feature selection method based on the GJO-GWO algorithm.To achieve this, a set of ten UCI [92] classification datasets with multiple features and redundant information was selected for analysis under the same constraints.
(1) Datasets.To validate the effectiveness of applying the GJO-GWO algorithm to feature selection problems, numerical experiments were conducted on ten datasets from the UCI repository [92].Detailed information about these datasets is presented in Table 7. 1.
(2) Parameter settings.To provide a comprehensive and objective validation of the superiority and feasibility of the feature selection method based on GJO-GWO, the parameter settings for each algorithm are presented in Table 8.
Table 8 displays the parameter configurations for the comparative feature selection algorithms used in the experiments.These parameters encompass the population size, maximum number of iterations, and other algorithm-specific settings.

Evaluation metrics.
To comprehensively evaluate the GJO-GWO algorithm's performance in feature selection problems, we utilized the metrics listed in Table 9 to assess the algorithm [100].
In Table 9, Accuracy represents the classification accuracy of the algorithm on each dataset.TP, TN, FP, and FN refer to the true positive, true negative, false positive, and false.negative.Acc represents the average classification accuracy across datasets.A higher average classification accuracy indicates better classification performance of the algorithm.Num represents the algorithm's average number of selected features across datasets.A lower average number of selected features indicates a more significant reduction in redundant information.Rt represents the average runtime of the algorithm across datasets.A smaller average runtime implies faster optimization performance.Acc(i) represents the classification accuracy in the i th experiment.Fea(i) represents the number of selected features in the i th experiment.Runtime(i) is the runtime in the i th experiment, where i ranges from 1 to 10, denoting the ten repeated experiments.

Experimental results.
Based on the parameter settings in Table 8, we conducted numerical experiments to compare the performance of the feature selection method based on GJO-GWO with ten other metaheuristic algorithms on the ten classification datasets listed in ABC [39], ASO [63], BA [93], DE [94], GSA [95], MVO [55], PSO [96], SSA [58], GWO [42], EO [97], HHO [98], MPA [99], GJO, GJO-GWO  Table 7.The experimental results are presented in Tables 10 and 11.Tables 10 and 11 display the average number of selected features and average classification accuracy of GJO-GWO when used for feature selection compared to the 13 other algorithms on the ten classification datasets, all with k = 5.The best-performing values in the tables have been highlighted in bold text.Additionally, Figs 3 and 4 visually represent each algorithm's average number of selected features and average classification accuracy. 1.
(1) Impact of a single indicator on algorithm performance.Regarding the average number of selected features, as indicated by Table 10 and Fig 3, the feature selection method based on MPA demonstrates superior overall performance.Conversely, the performance of the feature selection method based on GJO-GWO is relatively weaker than the other 13 contrast algorithms on most datasets.However, it reduces the number of selected features by half relative to the total number.
Regarding the average classification accuracy indicator, as shown in Table 11 and Fig 4, the feature selection method based on GJO-GWO achieves the best classification accuracy among all 14 algorithms on eight datasets, ranking first overall.Moreover, it attains 100% accuracy on five datasets (D2, D3, D4, D8, and D9).

1.
(2) Impact of multiple indicators on algorithm performance.When considering both the average number of selected features and the average classification accuracy, the feature selection method based on GJO-GWO exhibits suboptimal performance in terms of the average number of selected features but demonstrates superior performance in classification accuracy.This aligns well with the goal of feature selection, which is to balance the selection of features while ensuring high classification accuracy.Therefore, considering the combined influence of these two metrics, the feature selection method based on GJO-GWO outperforms the other 14 algorithms.
In summary, if one only considers the impact of a single metric on the proposed algorithm in this paper, the performance of the feature selection method based on GJO-GWO is moderate.However, when comprehensively considering the interplay between the two metrics, the performance of the proposed algorithm stands out as optimal.The inconsistency in the conclusion arises from the fact that even if the algorithm identifies fewer features through the search, it does not necessarily translate to higher classification accuracy, let alone superior algorithm performance.This highlights that being locally optimal at every step does not guarantee global optimality.The algorithm can be deemed superior performance only by selecting an appropriate number of features and ensuring optimal classification accuracy.
At first glance, the objective of the feature selection problem may seem straightforward: minimize the number of selected features while maximizing classification accuracy.However, when we consider storage space constraints, the motivation behind choosing fewer features becomes clear-it is to ensure faster algorithm execution within the same experimental environment.Consequently, in the context of solving the feature selection problem, the number of selected features is intimately linked with an algorithm's runtime.Therefore, scrutinizing the specific runtime of algorithms in solving the feature selection problem becomes paramount for performance assessment.In this context, to validate that the GJO-GWO-based feature selection method achieves superior classification accuracy and exhibits faster runtime, we meticulously recorded the average runtime of each algorithm across ten experiments on the ten classification datasets.This data is presented in detail in Table 12, with the best-performing values highlighted in bold for clarity.We provide a visual representation of the average runtime of each algorithm in Fig 5.
Combined with the insights from Table 12 and the visual representation in Fig 5, it becomes evident that the feature selection method utilizing GJO-GWO boasts a significantly reduced runtime compared to the comparative algorithms across all ten datasets.Remarkably, even as dataset sizes grow (measured by the number of features multiplied by the number of instances), the algorithm's runtime remains notably lower than that of the comparative algorithms.Considering the findings from Table 11 and Fig 4, it is clear that the proposed feature selection method based on GJO-GWO achieves enhanced computational efficiency and superior classification accuracy.
In conclusion, by considering the collective impact of three pivotal indicators: average selected feature count, average classification accuracy, and average runtime, it is evident that the feature selection method based on GJO-GWO excels in addressing the feature selection problem.

Convergence curve analysis.
The convergence curve depicts the trend of a turn at a particular point or interval, offering insights into the convergence and stability of an algorithm during the optimization process.Therefore, this section strongly emphasizes conducting a detailed analysis of the convergence curves.Fig 6 presents the convergence curves of 13 metaheuristic algorithms for feature selection.By examining the convergence curves in Fig 6, it becomes evident that except for D6, D7, and D10, the GJO-GWO algorithm achieves faster convergence to the optimal solution on the other datasets.This observation underscores that the GJO-GWO algorithm excels in terms of convergence speed and accuracy in most cases, further highlighting its outstanding performance in optimization.

Statistical analysis.
To comprehensively evaluate the GJO-GWO-based feature selection method's performance concerning three key metrics: average selected feature count, classification accuracy, and runtime, we have employed the following approach: 1.Comprehensive Ranking: We initiated the analysis by computing comprehensive rankings for various algorithms across these three metrics based on data from Tables 10-12.The summarized rankings are presented in Table 13.

Radar Chart
Visualization: Subsequently, we used the rankings from Table 13 to create radar charts, exemplified in Fig 7 .These radar charts visually depict algorithmic rankings across the performance metrics, with a smaller enclosed area indicating superior performance.13.When considering the consistent weighting of three indicators on the algorithm's performance, the triangle area formed by the feature selection method based on EO is smaller than the other 13 comparative algorithms, and the feature selection method based on GJO-GWO ranks third.However, when considering average classification accuracy and average runtime as the primary influencing factors, the performance of the feature selection method based on GJO-GWO is optimal.Regarding the average number of selected features, the MVO algorithm exhibits the best performance but at the cost of reducing average classification accuracy and average runtime.can be observed that the feature selection method based on GJO-GWO exhibits a distribution of optimal fitness values in a favorable and narrow range across all ten datasets.This indicates that the improved algorithm possesses better search performance and demonstrates superior stability in finding optimal feature subsets.

Optimal Fitness
In summary, considering the combined impact of the three metrics on algorithm performance, the higher average classification accuracy, shorter average running time, and more reasonable selection of feature subsets validate that the feature selection method based on GJO-GWO not only achieves faster search but also demonstrates more robust stability in solving feature selection problems.

Wilcoxon rank-sum test:
To better analyze the superiority of the GJO-GWO algorithm in feature selection, we recorded the average fitness of each algorithm for feature selection tasks.Subsequently, a rank-sum test was conducted for statistical analysis, and the results are presented in Table 14.
According to the statistical results in Table 14, when pairwise comparisons are made with other algorithms, the rank-sum test values of the GJO-GWO algorithm are significantly less than 0.05 for most datasets.This indicates that, in statistical terms, GJO-GWO demonstrates a significant advantage over the 13 comparison algorithms.This profound insight underscores the outstanding performance of GJO-GWO in feature selection problems, providing robust support for its reliability and effectiveness in practical applications.

Discussion
In Case 2, we conducted an in-depth investigation into the application of the GJO-GWO algorithm in combination with the KNN classifier for feature selection problems.We analyzed the experimental results on various complex datasets for 13 metaheuristic algorithms.Through  this analysis, we gained a comprehensive understanding of the performance of the GJO-GWO algorithm when applied to feature selection problems.Here are the specific findings summarized below.First and foremost, among the 13 metaheuristic algorithms applied to feature selection problems, the GJO-GWO algorithm demonstrates exceptional exploratory and exploitative performance.Its superior performance on high-dimensional datasets highlights its versatility in addressing FS problems.Additionally, the KNN-based GJO-GWO algorithm achieves higher classification accuracy and exhibits faster convergence on most datasets compared to other optimization algorithms.Lastly, the shorter average runtime implies that the KNNbased GJO-GWO algorithm is well-suited for swiftly solving complex feature selection problems.
While the KNN-based GJO-GWO algorithm generally produces efficient results for feature selection tasks, experiments reveal that it needs to excel in the number of features selected.These highlight feature selection algorithms' inherent challenge in maintaining high classification accuracy while reducing the number of features.It emphasizes the existence of a trade-off between the number of features and classification accuracy, and selecting the most suitable feature selection method based on specific requirements can yield better results.Additionally, as the optimization results are inherently based on non-exact but repeatable processes, applying the GJO-GWO algorithm in various scenarios or problems may result in different feature subsets.Finally, it is essential to note that the KNN-based GJO-GWO feature selection algorithm, being a classical wrapper-based feature selection method, may exhibit variations in runtime, classification accuracy, and the number of selected features when used with different classifiers.

Conclusion and future directions
This research has been centered around addressing feature selection problems to improve the optimization performance of the GJO algorithm.Through mechanistic analysis and numerical experiments, the following conclusions have been drawn: 1.A multi-strategy integrated Golden Jackal-Gray Wolf Hybrid Optimization Algorithm (GJO-GWO) has been proposed.Compared with nine other metaheuristic algorithms on eight benchmark datasets, the proposed GJO-GWO exhibits significant advantages in terms of convergence and stability.These advantages mainly manifest in two aspects: a) Introducing the Gray Wolf Algorithm increases solution diversity.b) The position update strategy based on Lagrange interpolation enhances the algorithm's convergence performance.
2. An efficient feature selection method based on GJO-GWO for classification tasks has been provided.On ten high-dimensional datasets, when compared to 13 state-of-the-art feature selection techniques, the proposed feature selection method demonstrates significant advantages in terms of accuracy, convergence speed, and runtime.These advantages mainly manifest in two aspects: a) Introducing the Gray Wolf Algorithm enhances solution diversity and improves the algorithm's runtime efficiency due to its programming framework.b) The position update strategy based on Lagrange interpolation effectively increases the algorithm's convergence speed.The clever integration of these strategies allows the algorithm to adaptively adjust the balance between exploration and exploitation at different search stages.
Despite the overall better performance of the feature selection method based on GJO-GWO presented in this paper, some things could still be improved.For instance, it often selects a relatively large number of features during feature selection for classification, which might not be ideal for subsequent machine learning or deep learning tasks.Therefore, we plan to conduct further research to address these issues in the future, as outlined below: 1.In the future, we intend to design a corresponding optimization and development framework around the GJO-GWO algorithm.This framework will be suitable for handling single or multi-objective optimization problems such as real-time feature selection, autonomous intelligent scheduling, image threshold segmentation, power system dispatch optimization, and neural network architecture search.
2. In the future, we aim to build a data mining and analytics system based on GJO-GWO.We will explore using the GJO-GWO algorithm or a combination of various metaheuristic algorithms as the underlying algorithms to create an integrated data mining and analytics system that encompasses feature engineering, parameter optimization, machine learning, deep learning, and decision optimization.This system will facilitate rapid analysis of realworld engineering application problems.

Fig 7
visually portrays algorithm rankings across the three essential performance indicators.The radar chart, as displayed in Fig 7, is constructed using the complete ranking data from Table

Fig 5 .
Fig 5. Average runtime of different algorithms.https://doi.org/10.1371/journal.pone.0295579.g005 Boxplots: We conducted ten independent experiments for each algorithm to record the optimal fitness values in solving the feature selection problem.These values are showcased in boxplots inFig 8.

Table 13 and
Fig 7 are vital tools for assessing algorithmic performance based on average selected feature count, classification accuracy, and runtime.Fig 7, the radar chart, is crafted to visually represent the ranking outcomes from Table13.A smaller enclosed area within the radar chart signifies superior algorithmic performance.